261 research outputs found

    Extracting knowledge from complex unstructured corpora: Text classification and a case study on the safeguarding domain

    Get PDF
    The advances in internet, data collection and sharing technologies have lead to an increase in the amount of unstructured information in the form of news, articles, and social media. Additionally, many specialised domains such as the medical, law, and social science-related domains use unstructured documents as a main platform for collecting, storing and sharing domain-specific knowledge. However, the manual processing of these documents is a resource-consuming and error-prone process. This is especially apparent when the volume of the documents that need annotating constantly increases over time. Therefore, automated information extraction techniques have been widely used to efficiently analyse text and discover patterns. Specifically, text classification methods have become valuable for specialised domains for organising content, such as patient notes, and help fast topic-based retrieval of information. However, many specialised domains suffer from lack of data and class imbalance problems because documents are hard to obtain. In addition, the manual annotation needs to be performed by experts which can be costly. This makes the application of supervised classification approaches a challenging task. In this thesis, we research methods for improving the performance of text classifiers for specialised domains with limited amounts of data and highly domain-specific terminology where the annotation of documents is performed by domain experts. First, we study the applicability of traditional feature enhancement approaches using publicly available resources for improving classifiers performance for specialised domains. Then, we conduct extensive research into suitability of existing classification algorithms and the importance of both domain and task specific data for few-shot classification which helps identify classification strategies applicable to small datasets. This gives the basis for the development of a methodology for improving a classifier’s performance for few-shot settings using text generation-based data augmentation techniques. Specifically, we aim to improve quality of generated data by using strategies for selecting class representative samples from the original dataset used to produce additional training instances. We perform extensive analysis, considering multiple strategies, datasets, and few-shot text classification settings. Our study uses a corpus of safeguarding reports as an exemplary case study of a specialised domain with a small volume of data. The safeguarding reports contain valuable information about learning experiences and reflections on tackling serious crimes involving children and vulnerable adults. They carry great potential to improve multiagency work and help develop better crime prevention strategies. However, the lack of centralised access and the constant growth of the collection, make the manual analysis of the reports unfeasible. Therefore, we collaborated with the Crime and Security Research Institute (CSRI) at Cardiff University for the creation of a Wales Safeguarding Repository (WSR) for providing a centralised access to the safeguarding reports and means for automatic information extraction. The aim of the repository is to facilitate efficient searchability of the collection and thus help free up resources and assist practitioners from health and social care agencies in making faster and more accurate decisions. In particular, we apply methods identified in the thesis, in order to support automated annotation of the documents using a thematic framework, created by subject-matter experts. Our close work with domain experts throughout the thesis allowed incorporating experts‘ knowledge into classification and augmentation techniques which proved beneficial for the improvement of automated supervised methods for specialised domains

    FIMCAR II: Accident Analysis

    Get PDF
    For the assessment of vehicle safety in frontal collisions compatibility (which consists of self and partner protection) between opponents is crucial. Although compatibility has been analysed worldwide for years, no final assessment approach has been defined to date. Taking into account the European Enhanced Vehicle safety Committee (EEVC) compatibility and frontal impact working group (WG15) and the EC funded FP5 VC-COMPAT project activities, two test approaches have been identified as the most promising candidates for the assessment of compatibility. Both are composed of an off-set and a full overlap test procedure. In addition another procedure (a test with a moving deformable barrier) is getting more attention in today’s research programmes. The overall objective of the FIMCAR project is to complete the development of the candidate test procedures and propose a set of test procedures suitable for regulatory application to assess and control a vehicle’s frontal impact and compatibility crash safety. In addition an associated cost benefit analysis should be performed. The specific objectives of the work reported in this deliverable were: • Determine if previously identified compatibility issues are still relevant in current vehicle fleet o Structural interaction o Frontal force matching o Compartment strength in particular for light cars • Determine nature of injuries and injury mechanisms o Body regions injured o Injury mechanism ▪ Contact with intrusion ▪ Contact ▪ Deceleration / restraint induced The main data sources for this report were the CCIS and Stats 19 databases from Great Britain and the GIDAS database from Germany. The different sampling and reporting schemes for the detailed databases (CCIS & GIDAS) sometimes do not allow for direct comparisons of the results. However the databases are complementary – CCIS captures more severe collisions highlighting structure and injury issues while GIDAS provides detailed data for a broader range of crash severities. The following results represent the critical points for further development of test procedures in FIMCAR

    Re-examining advice to complete antibiotic courses: a qualitative study with clinicians and patients

    Get PDF
    BACKGROUND: Antibiotic treatment duration may be longer than sometimes needed. Stopping antibiotics early, rather than completing pre-set antibiotic courses, may help reduce unnecessary exposure to antibiotics and antimicrobial resistance (AMR). AIM: To identify clinicians' and patients' views on stopping antibiotics when better (SAWB) for urinary tract infections (UTIs), and to explore comparisons with other acute infections. DESIGN & SETTING: An exploratory qualitative study with general practice clinicians and patients in England. METHOD: Primary care clinicians and patients who had recent UTI experience were recruited in England. Remote one-to-one interviews with clinicians and patients, and one focus group with patients, were conducted. Data were audiorecorded, transcribed, and analysed thematically. RESULTS: Eleven clinicians (seven GPs) and 19 patients (14 with experience of recurrent and/or chronic UTIs) were included. All participants considered SAWB unfamiliar and contradictory to well-known advice to complete antibiotic courses, but were interested in the evidence for risks and benefits of SAWB. Clinicians were amenable if evidence and guidelines supported it, whereas patients were more averse because of concerns about the risk of UTI recurrence and/or complications and AMR. Participants viewed SAWB as potentially more appropriate for longer antibiotic courses and other infections (with longer courses and lower risk of recurrence and/or complications). Participants stressed the need for unambiguous advice and SAWB as part of shared decision making and personalised advice. CONCLUSION: Patients were less accepting of SAWB, whereas clinicians were more amenable to it. Patients and clinicians require good evidence that this novel approach to self-determining antibiotic duration is safe and beneficial. If evidence based, SAWB should be offered with an explanation of why the advice differs from the ‘complete the course’ instruction, and a clear indication of when exactly to stop antibiotics should be given

    Knowledge extraction from a small corpus of unstructured safeguarding reports

    Get PDF
    This paper presents results on the performance of a range of analysis tools for extracting entities and sentiments from a small corpus of unstructured, safeguarding reports. We use sentiment analysis to identify strongly positive and strongly negative segments in an attempt to attribute patterns on the sentiments extracted to specific entities. We use entity extraction for identifying key entities. We evaluate tool performance against non-specialist human annotators. An initial study comparing the inter-human agreement against inter-machine agreement shows higher overall scores from human annotators than software tools. However, the degree of consensus between the human annotators for entity extraction is lower than expected which suggests a need for trained annotators. For sentiment analysis, the annotators reached a higher agreement for annotating descriptive sentences compared to reflective sentences, while the inter-tool agreement was similarly low for the two sentence types. The poor performance of the entity extraction and sentiment analysis approaches point to the need for domain-specific approaches for knowledge extraction on these kinds of document. However, there is currently a lack of pre-existing ontologies in the safeguarding domain. Thus, in future, our focus is the development of such a domain-specific ontology

    Cop1 constitutively regulates c-Jun protein stability and functions as a tumor suppressor in mice

    Get PDF
    Biochemical studies have suggested conflicting roles for the E3 ubiquitin ligase constitutive photomorphogenesis protein 1 (Cop 1; also known as Rfwd2) in tumorigenesis, providing evidence for both the oncoprotein c-Jun and the tumor suppressor p53 as its targets. Here we present what we believe to be the first in vivo investigation of the role of Cop1 in cancer etiology. Using an innovative genetic approach to generate an allelic series of Cop1, we found that Cop1 hypomorphic mice spontaneously developed malignancy at a high frequency in the first year of life and were highly susceptible to radiation-induced lymphomagenesis. Further analysis revealed that c-Jun was a key physiological target for Cop1 and that Cop1 constitutively kept c-Jun at low levels in vivo and thereby modulated c-Jun/AP-1 transcriptional activity. Importantly, Cop1 deficiency stimulated cell proliferation in a c-Jun-dependent manner. Focal deletions of COP1 were observed at significant frequency across several cancer types, and COP1 loss was determined to be one of the mechanisms leading to c-Jun upregulation in human cancer. We therefore conclude that Cop1 is a tumor suppressor that functions, at least in part, by antagonizing c-Jun oncogenic activity. In the absence of evidence for a genetic interaction between Cop1 and p53, our data strongly argue against the use of Cop1-inhibitory drugs for cancer therapy

    Defining myocardial tissue abnormalities in end-stage renal failure with cardiac magnetic resonance imaging using native T1 mapping

    Get PDF
    Noninvasive quantification of myocardial fibrosis in end-stage renal disease is challenging. Gadolinium contrast agents previously used for cardiac magnetic resonance imaging (MRI) are contraindicated because of an association with nephrogenic systemic fibrosis. In other populations, increased myocardial native T1 times on cardiac MRI have been shown to be a surrogate marker of myocardial fibrosis. We applied this method to 33 incident hemodialysis patients and 28 age- and sex-matched healthy volunteers who underwent MRI at 3.0T. Native T1 relaxation times and feature tracking–derived global longitudinal strain as potential markers of fibrosis were compared and associated with cardiac biomarkers. Left ventricular mass indices were higher in the hemodialysis than the control group. Global, Septal and midseptal T1 times were all significantly higher in the hemodialysis group (global T1 hemodialysis 1171 ± 27 ms vs. 1154 ± 32 ms; septal T1 hemodialysis 1184 ± 29 ms vs. 1163 ± 30 ms; and midseptal T1 hemodialysis 1184 ± 34 ms vs. 1161 ± 29 ms). In the hemodialysis group, T1 times correlated with left ventricular mass indices. Septal T1 times correlated with troponin and electrocardiogram-corrected QT interval. The peak global longitudinal strain was significantly reduced in the hemodialysis group (hemodialysis -17.7±5.3% vs. -21.8±6.2%). For hemodialysis patients, the peak global longitudinal strain significantly correlated with left ventricular mass indices (R = 0.426), and a trend was seen for correlation with galectin-3, a biomarker of cardiac fibrosis. Thus, cardiac tissue properties of hemodialysis patients consistent with myocardial fibrosis can be determined noninvasively and associated with multiple structural and functional abnormalities

    The cost-effectiveness of screening for ovarian cancer: results from the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS)

    Get PDF
    Background: To assess the within trial cost-effectiveness of an NHS ovarian cancer screening (OCS) programme using data from UKCTOCS and extrapolate results based on average life expectancy. Methods: Within trial economic evaluation of no screening (C) versus either (1) an annual OCS programme using transvaginal ultrasound (USS) or (2) an annual ovarian cancer multimodal screening programme with serum CA125 interpreted using a risk algorithm (ROCA) and transvaginal ultrasound as a second line test (MMS), plus comparison of lifetime extrapolation of the no screening arm and the MMS programme using both a predictive and a Markov model. Results: Using a CA125-ROCA cost of £20, the within trial results show USS to be strictly dominated by MMS, with the MMS versus C comparison returning an Incremental Cost-Effectiveness ratio (ICER) of £91,452 per life year gained (LYG). If the CA125-ROCA unit cost is reduced to £15 the ICER becomes £77,818 per LYG. Predictive extrapolation over the expected lifetime of the UKCTOCS women returns an ICER of £30,033 per LYG, while Markov modelling produces an ICER of £46,922 per QALY. Conclusions: Analysis suggests that, after accounting for the lead-time required to establish full mortality benefits, a national OCS programme based on the MMS strategy quickly approaches the current NICE thresholds for cost-effectiveness when extrapolated out to lifetime as compared to the within trial ICER estimates. Whether MMS could be recommended on economic grounds would depend on the confirmation and size of the mortality benefit at the end of an ongoing follow-up of the UKCTOCS cohort
    • …
    corecore